Skip to content

fix(vm_service): state machine does not run reconcileMachineAddresses#721

Merged
wikkyk merged 8 commits intomainfrom
710_fix_proxmoxmachine_status_addresses
Apr 16, 2026
Merged

fix(vm_service): state machine does not run reconcileMachineAddresses#721
wikkyk merged 8 commits intomainfrom
710_fix_proxmoxmachine_status_addresses

Conversation

@65278
Copy link
Copy Markdown
Collaborator

@65278 65278 commented Apr 13, 2026

Issue #, if available:
#710
#714

Description of changes:
The state machine was set to the wrong state after reconcilePowerState, therefore proxmoxmachine.Status.Addresses was never populated. Our tests for reconcileMachineAddresses succesfully ran as the code itself was fine. Our E2E tests did not fail because we do not depend on this field.

To catch these kinds of errors in the future, I've made a new test for ReconcileVM which tries to test its workings in their entirety.

Testing performed:
make test

Comment thread internal/service/vmservice/vm_test.go Outdated
@65278
Copy link
Copy Markdown
Collaborator Author

65278 commented Apr 13, 2026

I've added a CloneVM mock to the testcase, now it's rather complete. The only thing that could be added would be a full bootstrap.go test, but that's a full can of worms I'll leave to somebody else to implement.

Now we know that you need to requeue at least 4 times to deploy a capmox machine.

@65278 65278 force-pushed the 710_fix_proxmoxmachine_status_addresses branch from 2cea172 to ade1f57 Compare April 14, 2026 12:40
@65278 65278 force-pushed the 710_fix_proxmoxmachine_status_addresses branch from ade1f57 to e137c7c Compare April 14, 2026 13:20
@65278 65278 force-pushed the 710_fix_proxmoxmachine_status_addresses branch from 3302866 to f9c2804 Compare April 14, 2026 14:17
@65278 65278 force-pushed the 710_fix_proxmoxmachine_status_addresses branch from f9c2804 to 2e137c6 Compare April 14, 2026 16:56
@65278
Copy link
Copy Markdown
Collaborator Author

65278 commented Apr 14, 2026

Actual deployment performed. It turns out we had problems way deeper than the initial problem:

A qmstart task always fails on our proxmox. The VM starts anyway with two different errors. I've made it so that taskfailure now knows about qmstart and treats it as a special case.
The failure in tasks caused the state machine to always run incorrectly even after the naive fix. This is also fixed now.

Because this triggered the quality gate, I've had claude add a test for task.go, even though we're only testing invariants and the mock kubernetes client.

@65278 65278 force-pushed the 710_fix_proxmoxmachine_status_addresses branch from 2e137c6 to 22fd914 Compare April 14, 2026 17:40
Copy link
Copy Markdown
Member

@mcbenjemaa mcbenjemaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just some remarks,
The test added in vm_test is great.

Comment thread internal/service/taskservice/task_test.go Outdated
Comment thread internal/service/taskservice/task_test.go Outdated
Comment thread internal/service/taskservice/task_test.go Outdated
Comment thread internal/service/taskservice/task_test.go Outdated
Comment thread internal/service/taskservice/task_test.go Outdated
Comment thread internal/service/vmservice/power.go
Comment thread internal/service/vmservice/vm.go Outdated
Comment thread internal/service/vmservice/vm_test.go
Comment thread internal/service/vmservice/vm_test.go
Comment thread internal/service/vmservice/vm_test.go Outdated
@wikkyk
Copy link
Copy Markdown
Collaborator

wikkyk commented Apr 16, 2026

even though we're only testing invariants

You never know, you know. It's not a bad thing to assert invariants.

65278 and others added 3 commits April 16, 2026 12:54
The state machine was set to the wrong state after reconcilePowerState,
therefore proxmoxmachine.Status.Addresses was never populated.
Our tests for reconcileMachineAddresses succesfully ran as the code
itself was fine. Our E2E tests did not fail because we do not depend
on this field.

To catch these kinds of errors in the future, I've made a new test
for ReconcileVM which tries to test its workings in their entirety.

fix: 710
fix: 714
Exercise task lifecycle via ReconcileInFlightTask, disk resize via
ResizeDisk, and fix a misleading comment.

Co-Authored-By: Claude <noreply@anthropic.com>
65278 and others added 5 commits April 16, 2026 12:54
TaskFailure State is now only set if the action is not qmstart.
It looks impossible to detect a failure in qmstart. This fixes
the state machine transition in ReconcileVM.
Coverage for task.go dropped below the gate after the qmstart fix.
These tests exercise every switch branch but are mostly tautological
since the logic is straightforward and already covered by the
end-to-end ReconcileVM test.

Co-Authored-By: Claude <noreply@anthropic.com>
@wikkyk wikkyk force-pushed the 710_fix_proxmoxmachine_status_addresses branch from 6f06faf to 75dc717 Compare April 16, 2026 10:54
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@wikkyk wikkyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @mcbenjemaa?

@wikkyk wikkyk merged commit 4d93e9b into main Apr 16, 2026
13 checks passed
@wikkyk wikkyk deleted the 710_fix_proxmoxmachine_status_addresses branch April 16, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants